CEM RIZALAR
TÜMAY KIR
ÇAĞATAY NÜFER
ONUR DURMUŞ
3
INTRODUCTION
Statistical forecasting refers to making predictions for future values based on past data.
It employs methods such as time series, cross-sectional or longitudinal data. For the purpose
of this project, time series are used to estimate the future sales of eight products on sale on
Trendyol.com. To analyze the data, it must be converted into time series. Time series analysis
is useful in a way to extract relevant data to the product being forecasted and other
characteristics, such as seasonality, trends, and autocorrelation. After embarking on with the
analysis, one is able to see the critical points of the data. After the analysis, several models
should be considered to fit the data based on multiple values. These values, such as mean
absolute percentage error, and mean absolute error, lead us to the best model fitting the data
of the product. Moving on with the model selection, the critical points then can be eliminated
if it’s considered to be affecting the future values. For example, if a holiday season has a
crucial effect on the sales, the effect should be removed from the model as it resonates on
daily regular sales. Finally, when the elimination is done, every factor is included and the
accompanying model is settled, the model can be used to forecast future sales. The means
through is R coding language for forecasting in this project.
MODELS
Several models can be used to fit a time series data. These models include moving
average models, exponential moving average models, naive model, autoregressive models,
ETS models, ARIMA models, single and multiple regression models, dynamic regression
models, and approaches to generate coherent forecasts such as bottom to top, top to bottom
and middle-out approaches.
Moving average models employs linearity to forecast a future value. It’s used for
univariate time series. The future predictions are based on solely the linearity of the past and
current data. As a type of a moving average model, exponential moving average model
utilizes weights on the past data. The most recent data is corresponded with a larger weight to
forecast the future values. Moving average models are a type of naive model. Naive models
don’t require any underlying assumptions other than the past data. Factors such as
autocorrelation, seasonality, trends, and special dates are not included in naive models.
Contrary to moving average models, Autoregressive models do not necessarily have to
be stationary. Autoregressive models are used to explain random occurences. The model
suggests that the forecast depends on the linearity of past data and a stochastic term. A
stochastic term is a random term and it’s impossible to predict the term fully correct.
Autoregressive models are a component of autoregressive–moving-average, and
autoregressive integrated moving average models. Autoregressive integrated moving average
model or ARIMA is crucial in our case. ARIMA model benefits the forecast by removing
non-stationarity using differencing once, or a several times depending on the data. ARIMA
models are generally denoted as ARIMA(p,d,q). The p in the parentheses meaning the number
of time lags, d meaning the number of times the data is differenced to remove non-
stationarity, and finally, q corresponding to the moving average term. ARIMA models can be